knitr::include_graphics("group.gif")Ordinal Assignment
1 Introduction
The dataset contains 4,092 rows and 21 features collected from community-based health screening data. One of the key variables of interest is fasting blood sugar (FBS), which will be categorized into three clinical classes:
Normal (FBS < 5.6 mmol/L)
Prediabetes (FBS 5.6–6.9 mmol/L)
Diabetes (FBS ≥ 7.0 mmol/L)
The dataset includes a variety of demographic, anthropometric, and biochemical features such as:
Demographic data: age, gender, rural/urban locality
Lifestyle indicators: smoking status, hypertension history
Anthropometric measures: height, weight, waist and hip circumference
Blood pressure: mean systolic and diastolic blood pressure
Biochemical values: HbA1c, cholesterol profiles (HDL, LDL, triglycerides, total cholesterol), and oral glucose tolerance test (OGTT) values
This dataset is suitable for exploring predictors of abnormal fasting blood sugar levels using ordinal logistic regression.
2 Preparation
library(here)
library(tidyverse)
library(haven)
library(gtsummary)
library(VGAM)
library(nnet)
library(broom)
library(knitr)
library(kableExtra)
library(tibble)
library(purrr)
library(gt)
library(ggplot2)
library(ggeffects)
library(reshape2)
library(data.table)
library(ordinal)
library(foreign)
library(brant)
library(patchwork)3 Read Data
datafbs.o <- read_csv("datamssm_b.csv")
summary(datafbs.o) codesub age hpt smoking
Length:4092 Min. :18.00 Length:4092 Length:4092
Class :character 1st Qu.:38.00 Class :character Class :character
Mode :character Median :48.00 Mode :character Mode :character
Mean :48.05
3rd Qu.:58.00
Max. :89.00
dmdx height weight waist
Length:4092 Min. :1.270 Min. : 30.00 Min. : 50.80
Class :character 1st Qu.:1.510 1st Qu.: 53.80 1st Qu.: 77.00
Mode :character Median :1.560 Median : 62.30 Median : 86.00
Mean :1.568 Mean : 63.92 Mean : 86.44
3rd Qu.:1.630 3rd Qu.: 72.00 3rd Qu.: 95.00
Max. :1.960 Max. :187.80 Max. :154.50
NA's :1 NA's :1 NA's :2
hip msbpr mdbpr hba1c
Min. : 61.00 Min. : 68.5 Min. : 41.50 Min. : 0.200
1st Qu.: 91.00 1st Qu.:117.0 1st Qu.: 70.00 1st Qu.: 5.100
Median : 97.00 Median :130.0 Median : 78.00 Median : 5.400
Mean : 97.92 Mean :133.9 Mean : 78.56 Mean : 5.829
3rd Qu.:104.00 3rd Qu.:147.0 3rd Qu.: 86.00 3rd Qu.: 5.800
Max. :160.00 Max. :237.0 Max. :128.50 Max. :15.000
NA's :2 NA's :22
fbs mogtt1h mogtt2h totchol
Min. : 2.500 Min. : 0.160 Min. : 0.160 Min. : 0.180
1st Qu.: 4.480 1st Qu.: 6.660 1st Qu.: 5.240 1st Qu.: 4.970
Median : 5.180 Median : 8.660 Median : 6.680 Median : 5.700
Mean : 5.734 Mean : 9.212 Mean : 7.436 Mean : 5.790
3rd Qu.: 6.020 3rd Qu.:10.910 3rd Qu.: 8.480 3rd Qu.: 6.525
Max. :28.010 Max. :41.500 Max. :37.370 Max. :23.140
NA's :495 NA's :497 NA's :5
ftrigliz hdl ldl gender
Min. : 0.110 Min. :0.080 Min. : 0.140 Length:4092
1st Qu.: 0.930 1st Qu.:1.110 1st Qu.: 2.800 Class :character
Median : 1.260 Median :1.320 Median : 3.460 Mode :character
Mean : 1.541 Mean :1.349 Mean : 3.544
3rd Qu.: 1.780 3rd Qu.:1.550 3rd Qu.: 4.230
Max. :12.660 Max. :4.430 Max. :10.560
NA's :4 NA's :3 NA's :4
crural
Length:4092
Class :character
Mode :character
glimpse(datafbs.o)Rows: 4,092
Columns: 21
$ codesub <chr> "AHA215459", "LAW215133", "SAL215736", "MJB216145", "TIS22632…
$ age <dbl> 44, 64, 41, 34, 58, 39, 48, 41, 34, 22, 38, 47, 33, 20, 51, 4…
$ hpt <chr> "no", "no", "no", "no", "no", "no", "no", "no", "no", "no", "…
$ smoking <chr> "still smoking", "never smoked", "never smoked", "still smoki…
$ dmdx <chr> "no", "no", "no", "no", "no", "no", "no", "no", "no", "no", "…
$ height <dbl> 1.63, 1.59, 1.46, 1.52, 1.48, 1.53, 1.48, 1.52, 1.55, 1.65, 1…
$ weight <dbl> 52.65, 55.30, 49.90, 48.70, 43.35, 60.10, 70.70, 63.65, 50.00…
$ waist <dbl> 71.0, 83.5, 70.0, 75.0, 78.1, 83.0, 102.6, 90.0, 73.0, 75.0, …
$ hip <dbl> 85.0, 89.9, 88.0, 86.7, 87.8, 93.5, 102.9, 104.0, 87.0, 90.0,…
$ msbpr <dbl> 124.5, 157.5, 115.5, 111.0, 140.5, 160.0, 119.5, 150.0, 110.5…
$ mdbpr <dbl> 73.5, 68.0, 64.5, 68.5, 68.5, 99.5, 76.0, 82.0, 73.5, 79.0, 7…
$ hba1c <dbl> 4.2, 5.3, 5.0, 5.1, 5.2, 5.9, 5.5, 4.4, 5.1, 4.9, 5.3, 5.5, 5…
$ fbs <dbl> 2.50, 2.50, 2.51, 2.52, 2.52, 2.52, 2.52, 2.52, 2.55, 2.55, 2…
$ mogtt1h <dbl> 2.04, 9.99, 7.82, 5.67, 10.28, 11.44, 6.31, 9.00, 5.03, 4.30,…
$ mogtt2h <dbl> 3.30, 8.44, 6.42, 5.45, 4.56, 6.95, 5.58, 7.07, 4.55, 5.34, 7…
$ totchol <dbl> 4.18, 5.27, 5.10, 5.55, 6.72, 6.69, 7.28, 5.74, 3.48, 4.08, 4…
$ ftrigliz <dbl> 1.20, 1.28, 0.68, 0.86, 1.13, 1.59, 2.18, 1.24, 0.26, 0.96, 0…
$ hdl <dbl> 1.07, 0.99, 1.47, 1.49, 1.46, 1.28, 1.20, 1.22, 1.04, 0.81, 1…
$ ldl <dbl> 2.43, 3.64, 2.73, 3.56, 4.54, 4.08, 5.02, 3.70, 2.01, 2.34, 2…
$ gender <chr> "male", "male", "female", "male", "female", "female", "female…
$ crural <chr> "rural", "rural", "rural", "rural", "urban", "rural", "rural"…
4 Convert character to factor
datafbs.o <- datafbs.o %>%
mutate(across(where(is.character), as.factor))
glimpse(datafbs.o)Rows: 4,092
Columns: 21
$ codesub <fct> AHA215459, LAW215133, SAL215736, MJB216145, TIS226328, red115…
$ age <dbl> 44, 64, 41, 34, 58, 39, 48, 41, 34, 22, 38, 47, 33, 20, 51, 4…
$ hpt <fct> no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, n…
$ smoking <fct> still smoking, never smoked, never smoked, still smoking, nev…
$ dmdx <fct> no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, n…
$ height <dbl> 1.63, 1.59, 1.46, 1.52, 1.48, 1.53, 1.48, 1.52, 1.55, 1.65, 1…
$ weight <dbl> 52.65, 55.30, 49.90, 48.70, 43.35, 60.10, 70.70, 63.65, 50.00…
$ waist <dbl> 71.0, 83.5, 70.0, 75.0, 78.1, 83.0, 102.6, 90.0, 73.0, 75.0, …
$ hip <dbl> 85.0, 89.9, 88.0, 86.7, 87.8, 93.5, 102.9, 104.0, 87.0, 90.0,…
$ msbpr <dbl> 124.5, 157.5, 115.5, 111.0, 140.5, 160.0, 119.5, 150.0, 110.5…
$ mdbpr <dbl> 73.5, 68.0, 64.5, 68.5, 68.5, 99.5, 76.0, 82.0, 73.5, 79.0, 7…
$ hba1c <dbl> 4.2, 5.3, 5.0, 5.1, 5.2, 5.9, 5.5, 4.4, 5.1, 4.9, 5.3, 5.5, 5…
$ fbs <dbl> 2.50, 2.50, 2.51, 2.52, 2.52, 2.52, 2.52, 2.52, 2.55, 2.55, 2…
$ mogtt1h <dbl> 2.04, 9.99, 7.82, 5.67, 10.28, 11.44, 6.31, 9.00, 5.03, 4.30,…
$ mogtt2h <dbl> 3.30, 8.44, 6.42, 5.45, 4.56, 6.95, 5.58, 7.07, 4.55, 5.34, 7…
$ totchol <dbl> 4.18, 5.27, 5.10, 5.55, 6.72, 6.69, 7.28, 5.74, 3.48, 4.08, 4…
$ ftrigliz <dbl> 1.20, 1.28, 0.68, 0.86, 1.13, 1.59, 2.18, 1.24, 0.26, 0.96, 0…
$ hdl <dbl> 1.07, 0.99, 1.47, 1.49, 1.46, 1.28, 1.20, 1.22, 1.04, 0.81, 1…
$ ldl <dbl> 2.43, 3.64, 2.73, 3.56, 4.54, 4.08, 5.02, 3.70, 2.01, 2.34, 2…
$ gender <fct> male, male, female, male, female, female, female, female, fem…
$ crural <fct> rural, rural, rural, rural, urban, rural, rural, urban, rural…
5 Data Wrangling
The predictors that will be selected in this dataset are hba1c, triglycerides, age, waist circumference, smoking status, and gender.
datafbs.o <- datafbs.o %>%
select(fbs, hba1c, ftrigliz, age, waist, smoking, gender)
# inspect the selected data
glimpse(datafbs.o)Rows: 4,092
Columns: 7
$ fbs <dbl> 2.50, 2.50, 2.51, 2.52, 2.52, 2.52, 2.52, 2.52, 2.55, 2.55, 2…
$ hba1c <dbl> 4.2, 5.3, 5.0, 5.1, 5.2, 5.9, 5.5, 4.4, 5.1, 4.9, 5.3, 5.5, 5…
$ ftrigliz <dbl> 1.20, 1.28, 0.68, 0.86, 1.13, 1.59, 2.18, 1.24, 0.26, 0.96, 0…
$ age <dbl> 44, 64, 41, 34, 58, 39, 48, 41, 34, 22, 38, 47, 33, 20, 51, 4…
$ waist <dbl> 71.0, 83.5, 70.0, 75.0, 78.1, 83.0, 102.6, 90.0, 73.0, 75.0, …
$ smoking <fct> still smoking, never smoked, never smoked, still smoking, nev…
$ gender <fct> male, male, female, male, female, female, female, female, fem…
#confirm the reference group
levels(datafbs.o$smoking)[1] "never smoked" "quitted smoking" "still smoking"
levels(datafbs.o$gender)[1] "female" "male"
5.1 Rename Column
datafbs.o <- datafbs.o %>%
rename(
fbs_raw = fbs,
hba1c = hba1c,
triglycerides = ftrigliz,
age = age,
waist_circumference = waist,
smoking_status = smoking,
sex = gender
)6 Create categorical outcome variable
We will use cut() function to create a categorical (factor) outcome variable. Variable fbs will be categorized into a 3 category variable renamed as cat_fbs.
Then we use the label() function to label the variable.
datafbs.o <- datafbs.o %>%
mutate(cat_fbs = cut(fbs_raw,
breaks = c(-Inf, 5.6, 7.0, Inf),
labels = c("normal", "prediabetes", "diabetes"),
right = FALSE))Let us checked if we have grouped it correctly
# Summarizing the min and max of 'fbs_raw' for each 'cat_fbs' group
datafbs.o %>%
select(cat_fbs, fbs_raw) %>%
group_by(cat_fbs) %>%
summarize(
min_fbs_raw = min(fbs_raw, na.rm = TRUE),
max_fbs_raw = max(fbs_raw, na.rm = TRUE)
)Next, we will used ordered function to create an ordinal variable and define the levels of the variable.
We will reverse the order of the outcome variable.
lev <- c('diabetes','prediabetes','normal')
lev[1] "diabetes" "prediabetes" "normal"
datafbs.o <- datafbs.o %>%
mutate(cat_fbs1 = fct_relevel(cat_fbs, lev)) %>%
mutate(cat_fbs1 = ordered(cat_fbs1, levels = lev))Let us check if we have done correctly:
str(datafbs.o$cat_fbs1) Ord.factor w/ 3 levels "diabetes"<"prediabetes"<..: 3 3 3 3 3 3 3 3 3 3 ...
levels(datafbs.o$cat_fbs)[1] "normal" "prediabetes" "diabetes"
table(datafbs.o$cat_fbs)
normal prediabetes diabetes
2640 889 563
levels(datafbs.o$cat_fbs1)[1] "diabetes" "prediabetes" "normal"
table(datafbs.o$cat_fbs1)
diabetes prediabetes normal
563 889 2640
7 Exploratory Data Analysis
datafbs.o %>%
select(-fbs_raw, -cat_fbs1) %>% # remove unwanted variables
tbl_summary(
by = cat_fbs,
type = list(
smoking_status ~ "categorical",
sex ~ "categorical"
),
statistic = list(
all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{n} ({p}%)"
),
digits = all_continuous() ~ 2,
missing = "no"
) %>%
add_overall() %>%
bold_labels()| Characteristic | Overall N = 4,0921 |
normal N = 2,6401 |
prediabetes N = 8891 |
diabetes N = 5631 |
|---|---|---|---|---|
| hba1c | 5.83 (1.46) | 5.38 (0.67) | 5.76 (0.96) | 8.05 (2.46) |
| triglycerides | 1.54 (1.07) | 1.37 (0.88) | 1.70 (1.10) | 2.11 (1.50) |
| age | 48.05 (14.48) | 45.50 (14.67) | 52.53 (13.35) | 52.98 (12.09) |
| waist_circumference | 86.44 (13.04) | 84.47 (12.93) | 89.21 (12.42) | 91.31 (12.49) |
| smoking_status | ||||
| never smoked | 3,136 (77%) | 2,035 (77%) | 669 (75%) | 432 (77%) |
| quitted smoking | 322 (7.9%) | 188 (7.1%) | 80 (9.0%) | 54 (9.6%) |
| still smoking | 634 (15%) | 417 (16%) | 140 (16%) | 77 (14%) |
| sex | ||||
| female | 2,664 (65%) | 1,757 (67%) | 554 (62%) | 353 (63%) |
| male | 1,428 (35%) | 883 (33%) | 335 (38%) | 210 (37%) |
| 1 Mean (SD); n (%) | ||||
7.1 Plots
Use histogram for numerical predictors, while bar plot for categorical predictors.
Hbaic
# Histogram of HbA1c by FBS category p1 <- ggplot(datafbs.o, aes(x = hba1c, fill = cat_fbs)) + geom_histogram(binwidth = 0.2, position = "dodge", alpha = 0.7, color = "black") + facet_wrap(~ cat_fbs, scales = "free_y") + labs( title = "Distribution of HbA1c by FBS Category", x = "HbA1c (%)", y = "Count", fill = "FBS Category" ) + theme_minimal() p1Triglycerides
# Histogram of Triglycerides by FBS category p2 <- ggplot(datafbs.o, aes(x = triglycerides, fill = cat_fbs)) + geom_histogram(binwidth = 0.2, position = "dodge", alpha = 0.7, color = "black") + facet_wrap(~ cat_fbs, scales = "free_y") + labs( title = "Distribution of Triglycerides by FBS Category", x = "triglycerides levels", y = "Count", fill = "FBS Category" ) + theme_minimal() p2Age
# Histogram of age by FBS category p3 <- ggplot(datafbs.o, aes(x = age, fill = cat_fbs)) + geom_histogram(binwidth = 0.2, position = "dodge", alpha = 0.7, color = "black") + facet_wrap(~ cat_fbs, scales = "free_y") + labs( title = "Distribution of Age by FBS Category", x = "Age", y = "Count", fill = "FBS Category" ) + theme_minimal() p3Waist Circumference
# Histogram of Waist Circumference by FBS category p4 <- ggplot(datafbs.o, aes(x = waist_circumference, fill = cat_fbs)) + geom_histogram(binwidth = 0.2, position = "dodge", alpha = 0.7, color = "black") + facet_wrap(~ cat_fbs, scales = "free_y") + labs( title = "Distribution of Waist Circumference by FBS Category", x = "Waist Circumference", y = "Count", fill = "FBS Category" ) + theme_minimal() p4Smoking Status
# Bar plot of smoking status by FBS category p5 <- ggplot(datafbs.o, aes(x = cat_fbs, fill = smoking_status)) + geom_bar(position = "dodge", alpha = 0.7) + labs(title = "Distribution of Smoking Status by FBS Category", x = "FBS Category", y = "Count", fill = "Smoking Status") + theme_minimal() p5Sex
# Bar plot of sex by FBS category p6 <- ggplot(datafbs.o, aes(x = cat_fbs, fill = sex)) + geom_bar(position = "dodge", alpha = 0.7) + labs(title = "Distribution of Sex by FBS Category", x = "FBS Category", y = "Count", fill = "Sex") + theme_minimal() p6
8 Estimation
8.1 Adjacent-category model or multinomial or baseline logit model
Here, we will show how to replicate the baseline logit model (unconstrained). We could not reproduce the adjancent-category models that is based on constrained logit models.
Generate a new variable but with no ordering.
datafbs.o <-
datafbs.o %>%
mutate(cat_fbs2 = fct_relevel(cat_fbs, lev))Run the mlogit
mlogit1 <-
vglm(cat_fbs ~ smoking_status, multinomial, data = datafbs.o)
summary(mlogit1)
Call:
vglm(formula = cat_fbs ~ smoking_status, family = multinomial,
data = datafbs.o)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept):1 1.54983 0.05297 29.256 < 2e-16 ***
(Intercept):2 0.43736 0.06172 7.086 1.38e-12 ***
smoking_statusquitted smoking:1 -0.30237 0.16323 -1.852 0.064 .
smoking_statusquitted smoking:2 -0.04432 0.18662 -0.237 0.812
smoking_statusstill smoking:1 0.13946 0.13488 1.034 0.301
smoking_statusstill smoking:2 0.16048 0.15472 1.037 0.300
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Names of linear predictors: log(mu[,1]/mu[,3]), log(mu[,2]/mu[,3])
Residual deviance: 7254.844 on 8178 degrees of freedom
Log-likelihood: -3627.422 on 8178 degrees of freedom
Number of Fisher scoring iterations: 4
No Hauck-Donner effect found in any of the estimates
Reference group is level 3 of the response
8.1.1 Calculate Confidence interval for the model
conf_intervals <- confint(mlogit1)
print(conf_intervals) 2.5 % 97.5 %
(Intercept):1 1.4459984 1.65365264
(Intercept):2 0.3163864 0.55833059
smoking_statusquitted smoking:1 -0.6222921 0.01755696
smoking_statusquitted smoking:2 -0.4100896 0.32145785
smoking_statusstill smoking:1 -0.1248981 0.40380863
smoking_statusstill smoking:2 -0.1427738 0.46373083
8.1.2 Get the RRR
exp(coef(mlogit1)) (Intercept):1 (Intercept):2
4.7106481 1.5486111
smoking_statusquitted smoking:1 smoking_statusquitted smoking:2
0.7390663 0.9566517
smoking_statusstill smoking:1 smoking_statusstill smoking:2
1.1496474 1.1740726
Build final table using provided values.
# Define function to compute 95% CI for RRR
calc_ci_rrr <- function(estimate, stderr) {
lower <- estimate - 1.96 * stderr
upper <- estimate + 1.96 * stderr
return(c(exp(lower), exp(upper)))
}
# Input full results manually
coefficients <- data.frame(
Term = c(
"(Intercept):1", "(Intercept):2",
"smoking_statusquitted smoking:1", "smoking_statusquitted smoking:2",
"smoking_statusstill smoking:1", "smoking_statusstill smoking:2"
),
Estimate = c(
1.54983, 0.43736,
-0.30237, -0.04432,
0.13946, 0.16048
),
StdError = c(
0.05297, 0.06172,
0.16323, 0.18662,
0.13488, 0.15472
),
ZValue = c(
29.256, 7.086,
-1.852, -0.237,
1.034, 1.037
),
PValue = c(
"<2e-16", "1.38e-12",
"0.064", "0.812",
"0.301", "0.300"
)
)
# Calculate RRR and 95% CI for RRR
coefficients <- coefficients %>%
mutate(
RRR = exp(Estimate),
CI_RRR = map2(Estimate, StdError, calc_ci_rrr)
) %>%
unnest_wider(CI_RRR, names_sep = "_") %>%
rename(
CI_RRR_Lower = CI_RRR_1,
CI_RRR_Upper = CI_RRR_2
)
# Output the table
kable(coefficients, digits = 3, caption = "Combined Coefficients and RRR Table with Confidence Intervals")| Term | Estimate | StdError | ZValue | PValue | RRR | CI_RRR_Lower | CI_RRR_Upper |
|---|---|---|---|---|---|---|---|
| (Intercept):1 | 1.550 | 0.053 | 29.256 | <2e-16 | 4.711 | 4.246 | 5.226 |
| (Intercept):2 | 0.437 | 0.062 | 7.086 | 1.38e-12 | 1.549 | 1.372 | 1.748 |
| smoking_statusquitted smoking:1 | -0.302 | 0.163 | -1.852 | 0.064 | 0.739 | 0.537 | 1.018 |
| smoking_statusquitted smoking:2 | -0.044 | 0.187 | -0.237 | 0.812 | 0.957 | 0.664 | 1.379 |
| smoking_statusstill smoking:1 | 0.139 | 0.135 | 1.034 | 0.301 | 1.150 | 0.883 | 1.498 |
| smoking_statusstill smoking:2 | 0.160 | 0.155 | 1.037 | 0.300 | 1.174 | 0.867 | 1.590 |
8.2 Continuation-ratio
To perform a continuation-ratio analysis for the cat_fbs variable (categorized as normal, prediabetes, and diabetes), we need to create two sequential binary comparisons that reflect the forward progression of glycemic status:
compare cat_fbs=diabetes vs cat_fbs=prediabetes.
This assesses the odds of progressing to diabetes among those already beyond the normal stage.
compare cat_fbs=diabetes and prediabetes vs cat_fbs=normal.
This evaluates the odds of having progressed beyond the normal glycemic state into either prediabetes or diabetes.
8.2.0.1 1. compare cat_fbs=diabetes vs cat_fbs=prediabetes
table(datafbs.o$cat_fbs) ; table(datafbs.o$cat_fbs1)
normal prediabetes diabetes
2640 889 563
diabetes prediabetes normal
563 889 2640
datafbs.o1 <-
datafbs.o %>%
filter(cat_fbs == 'diabetes' | cat_fbs == 'prediabetes')
cr1 <- glm(cat_fbs1 ~ smoking_status, family = binomial(link ='logit'),
data = datafbs.o1)
summary(cr1)
Call:
glm(formula = cat_fbs1 ~ smoking_status, family = binomial(link = "logit"),
data = datafbs.o1)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.43736 0.06172 7.086 1.38e-12 ***
smoking_statusquitted smoking -0.04432 0.18662 -0.237 0.812
smoking_statusstill smoking 0.16048 0.15472 1.037 0.300
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1939.1 on 1451 degrees of freedom
Residual deviance: 1937.8 on 1449 degrees of freedom
AIC: 1943.8
Number of Fisher Scoring iterations: 4
8.2.0.2 2. compare cat_fbs=diabetes and prediabetes vs cat_fbs=normal
For this we will recode variable fbs using ifelse function. Please, make sure the code will be 0 and 1. using code bigger will not make glm function work
datafbs.o2 <- datafbs.o %>%
filter(cat_fbs == 'diabetes' |
cat_fbs == 'prediabetes'|
cat_fbs == 'normal')
table(datafbs.o2$cat_fbs1)
diabetes prediabetes normal
563 889 2640
datafbs.o2 <- datafbs.o2 %>%
mutate(cat_fbs2 = ifelse(cat_fbs1 == "diabetes", 0,
ifelse(cat_fbs1 == "prediabetes",0,1)))
table(datafbs.o2$cat_fbs1) ; table(datafbs.o2$cat_fbs2)
diabetes prediabetes normal
563 889 2640
0 1
1452 2640
And run the next CR model:
cr2 <-
glm(cat_fbs2 ~ smoking_status, family = binomial(link ='logit'),
data = datafbs.o2)
summary(cr2)
Call:
glm(formula = cat_fbs2 ~ smoking_status, family = binomial(link = "logit"),
data = datafbs.o2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.61428 0.03741 16.419 <2e-16 ***
smoking_statusquitted smoking -0.27567 0.11909 -2.315 0.0206 *
smoking_statusstill smoking 0.03891 0.09168 0.424 0.6713
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 5322.8 on 4091 degrees of freedom
Residual deviance: 5317.0 on 4089 degrees of freedom
AIC: 5323
Number of Fisher Scoring iterations: 4
# Function to calculate confidence intervals
calc_ci <- function(estimate, stderr) {
lower <- estimate - 1.96 * stderr
upper <- estimate + 1.96 * stderr
return(c(lower, upper))
}
# Create cr1 results
cr1_results <- data.frame(
Model = "cr1",
Term = c("(Intercept)", "smoking_statusquitted smoking", "smoking_statusstill smoking"),
Estimate = c(0.43736, -0.04432, 0.16048),
StdError = c(0.06172, 0.18662, 0.15472),
ZValue = c(7.086, -0.237, 1.037),
PValue = c(1.38e-12, 0.812, 0.300)
) %>%
mutate(CI = purrr::map2(Estimate, StdError, calc_ci)) %>%
tidyr::unnest_wider(CI, names_sep = "_")
# Create cr2 results
cr2_results <- data.frame(
Model = "cr2",
Term = c("(Intercept)", "smoking_statusquitted smoking", "smoking_statusstill smoking"),
Estimate = c(0.61428, -0.27567, 0.03891),
StdError = c(0.03741, 0.11909, 0.09168),
ZValue = c(16.419, -2.315, 0.424),
PValue = c(2e-16, 0.0206, 0.6713)
) %>%
mutate(CI = purrr::map2(Estimate, StdError, calc_ci)) %>%
tidyr::unnest_wider(CI, names_sep = "_")
# Combine the data frames
combined_results <- bind_rows(cr1_results, cr2_results) %>%
rename(CI_Lower = CI_1, CI_Upper = CI_2)
# Print the combined table in a tidy format
kable(combined_results, format = "pipe", caption = "Combined Logistic Regression Results with Confidence Intervals")| Model | Term | Estimate | StdError | ZValue | PValue | CI_Lower | CI_Upper |
|---|---|---|---|---|---|---|---|
| cr1 | (Intercept) | 0.43736 | 0.06172 | 7.086 | 0.0000 | 0.3163888 | 0.5583312 |
| cr1 | smoking_statusquitted smoking | -0.04432 | 0.18662 | -0.237 | 0.8120 | -0.4100952 | 0.3214552 |
| cr1 | smoking_statusstill smoking | 0.16048 | 0.15472 | 1.037 | 0.3000 | -0.1427712 | 0.4637312 |
| cr2 | (Intercept) | 0.61428 | 0.03741 | 16.419 | 0.0000 | 0.5409564 | 0.6876036 |
| cr2 | smoking_statusquitted smoking | -0.27567 | 0.11909 | -2.315 | 0.0206 | -0.5090864 | -0.0422536 |
| cr2 | smoking_statusstill smoking | 0.03891 | 0.09168 | 0.424 | 0.6713 | -0.1407828 | 0.2186028 |
8.2.0.3 3. Conclusion
Model 1 Comparison: Diabetes (vs. Prediabetes)
Among individuals with elevated fasting blood sugar levels (diabetes or prediabetes), the odds of being in the diabetes group rather than prediabetes group were:
1.17 times higher for those who still smoke compared to non-smokers (OR = exp(0.16048) = 1.17; p = 0.300).
0.96 times lower (or 4% lower odds) for those who had quit smoking compared to non-smokers (OR = exp(-0.04432) = 0.96; p = 0.812).
However, both associations were not statistically significant.
Model 2 Comparison: Diabetes and Prediabetes (vs. Normal)
When comparing those with any elevated FBS (diabetes or prediabetes) to those with normal levels, the odds of being in the elevated group were:
0.76 times lower (or 24% reduced odds) among former smokers compared to non-smokers (OR = exp(-0.27567) = 0.76; p = 0.021) — this was statistically significant.
1.04 times higher for current smokers, but the association was not statistically significant (OR = exp(0.03891) = 1.04; p = 0.671).
The estimates from the continuation-ratio models are relatively consistent across the comparisons, especially for the effect of smoking. In general, former smokers were less likely to fall into worse FBS categories, while current smoking status did not show a strong or consistent effect. The significant association in the second model indicates that quitting smoking may reduce the odds of having elevated fasting blood sugar compared to maintaining normal levels.
8.3 Cumulative link logit model
This is also known as the proportional odds model.
Cumulative Logit Models (Proportional Odds Models) Another example of proportional odds models. We use the the variables that code from the lowest fbs (=<5.6/normal) to the highest fbs (>=7.0/diabetes)
# Calculate mean fbs for each category of cat_fbs
mean_fbs_by_category <- datafbs.o %>%
group_by(cat_fbs) %>%
summarize(mean_fbs = mean(fbs_raw, na.rm = TRUE))
# Display the result
print(mean_fbs_by_category)# A tibble: 3 × 2
cat_fbs mean_fbs
<fct> <dbl>
1 normal 4.56
2 prediabetes 6.12
3 diabetes 10.6
Recheck the levels to confirm that from smallest category of fbs to highest fbs.
For the cat_fbs categories, the mean fbs values are as follows:
cat_fbs (=<5.6/normal): mean fbs values = 4.6
cat_fbs (5.7-6.9/prediabetes): mean fbs values = 6.1
cat_fbs (>=7.0/diabetes): mean fbs values = 10.6
In Stata for example, the way we treat cumulative link logits is similar (not similar in the estimation process, just the context) to ordinal linear regression. That is we estimate how big the increase in the outcome variable with every unit increase in the predictor.
levels(datafbs.o$cat_fbs)[1] "normal" "prediabetes" "diabetes"
The variables:
outcome = cat_fbs
covariate = age
We will use ordinal::clm function
outcome: cat_fbs
covariate: age
o.age <- clm(cat_fbs ~ age, data=datafbs.o)
summary(o.age)formula: cat_fbs ~ age
data: datafbs.o
link threshold nobs logLik AIC niter max.grad cond.H
logit flexible 4092 -3516.47 7038.95 5(0) 1.49e-10 8.1e+04
Coefficients:
Estimate Std. Error z value Pr(>|z|)
age 0.034294 0.002334 14.7 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Threshold coefficients:
Estimate Std. Error z value
normal|prediabetes 2.2862 0.1223 18.69
prediabetes|diabetes 3.5736 0.1300 27.49
These are
the log odds ratio and
the odss ratio
tidy(o.age, conf.int = TRUE)tidy(o.age, exponentiate = TRUE ,conf.int = TRUE)8.3.1 Interpretation
The coefficient β̂age=0.0342942 . This shows that the older the age, then the value of fbs also increases.
For every 1-year increase in age, the odds of being in a higher fbs category increased by about 1.3%.
For every 10-year increase in age, the odds of being in a higher fbs category increase by about 13.9%.
exp(1 * 0.013)[1] 1.013085
exp(10 * 0.013)[1] 1.138828
For every 1-year increase in age, the odds of being in a lower fbs category reduced by about 1.3%.
For every 10-year increase in age, the odds of being in a lower fbs category reduced by about 12.2%.
exp(1 * -0.013)[1] 0.9870841
exp(10 * -0.013)[1] 0.8780954
9 Prediction
9.1 Predicted probability
Use polr package
polr in MASS package
polr_cr3 <- MASS::polr(cat_fbs ~ age, data = datafbs.o, Hess = TRUE)
summary(polr_cr3)Call:
MASS::polr(formula = cat_fbs ~ age, data = datafbs.o, Hess = TRUE)
Coefficients:
Value Std. Error t value
age 0.03429 0.002334 14.69
Intercepts:
Value Std. Error t value
normal|prediabetes 2.2862 0.1223 18.6862
prediabetes|diabetes 3.5736 0.1300 27.4844
Residual Deviance: 7032.948
AIC: 7038.948
Then the probabilities are:
prob_polr <- predict(polr_cr3, type = 'probs')
head(prob_polr) ; tail(prob_polr) normal prediabetes diabetes
1 0.6850904 0.2023296 0.11258001
2 0.5228294 0.2759640 0.20120661
3 0.7068529 0.1904433 0.10270380
4 0.7540272 0.1633776 0.08259527
5 0.5737448 0.2561011 0.17015413
6 0.7208610 0.1825860 0.09655301
normal prediabetes diabetes
4087 0.6924417 0.1983592 0.10919907
4088 0.6391092 0.2260582 0.13483260
4089 0.6850904 0.2023296 0.11258001
4090 0.6069055 0.2414437 0.15165076
4091 0.4799882 0.2898325 0.23017931
4092 0.7277091 0.1786883 0.09360262
# Create a new data frame with age, smoke, and gender
newdat <- cbind(datafbs.o[, c("age", "smoking_status","sex")], prob_polr)
# Reshape the data to long format
lnewdat <- melt(newdat, id.vars = c("age", "smoking_status", "sex"),
variable.name = "fbs_category", value.name = "Probability")
# View the first few rows
head(lnewdat)ggplot(lnewdat, aes(x = age, y = Probability, colour = fbs_category)) +
geom_line() + facet_grid(sex ~ smoking_status, labeller="label_both")9.2 Manual calculation for prediction
summary(o.age)formula: cat_fbs ~ age
data: datafbs.o
link threshold nobs logLik AIC niter max.grad cond.H
logit flexible 4092 -3516.47 7038.95 5(0) 1.49e-10 8.1e+04
Coefficients:
Estimate Std. Error z value Pr(>|z|)
age 0.034294 0.002334 14.7 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Threshold coefficients:
Estimate Std. Error z value
normal|prediabetes 2.2862 0.1223 18.69
prediabetes|diabetes 3.5736 0.1300 27.49
And we want to predict these data
To predict the probabilities of i-th falls into each category for model for a new data, we need to create a new data first. We will use expand.grid() to create a new data of class data.frame
This is the new data
newData <- expand.grid(age = unique(datafbs.o$age))
head(newData)And now the predictions
lp.o.age <- predict(o.age, newdata = newData, type = 'linear.predictor')
lp.o.age$eta1
normal prediabetes diabetes
1 0.77726489 2.0646541 99998.49
2 0.09138080 1.3787700 99997.81
3 0.88014751 2.1675367 99998.59
4 1.12020694 2.4075961 99998.83
5 0.29714603 1.5845352 99998.01
6 0.94873592 2.2361251 99998.66
7 0.64008808 1.9274773 99998.35
8 1.53173740 2.8191266 99999.25
9 0.98303012 2.2704193 99998.70
10 0.67438228 1.9617715 99998.39
11 1.15450115 2.4418903 99998.87
12 1.60032581 2.8877150 99999.31
13 0.53720546 1.8245946 99998.25
14 0.74297069 2.0303599 99998.46
15 1.01732433 2.3047135 99998.73
16 0.40002864 1.6874178 99998.11
17 0.84585330 2.1332425 99998.56
18 0.26285182 1.5502410 99997.98
19 0.70867648 1.9960657 99998.42
20 1.56603160 2.8534208 99999.28
21 0.02279239 1.3101816 99997.74
22 1.63462001 2.9220092 99999.35
23 1.18879535 2.4761845 99998.90
24 1.05161853 2.3390077 99998.77
25 1.08591274 2.3733019 99998.80
26 0.57149967 1.8588888 99998.29
27 1.22308956 2.5104787 99998.94
28 0.91444171 2.2018309 99998.63
29 1.25738376 2.5447729 99998.97
30 1.39456058 2.6819498 99999.11
31 0.33144023 1.6188294 99998.05
32 1.42885478 2.7162440 99999.14
33 1.49744319 2.7848324 99999.21
34 0.46861705 1.7560062 99998.18
35 1.32597217 2.6133613 99999.04
36 1.46314899 2.7505382 99999.18
37 -0.18297284 1.1044163 99997.53
38 0.60579387 1.8931830 99998.32
39 -0.11438443 1.1730048 99997.60
40 0.15996921 1.4473584 99997.87
41 -0.01150181 1.2758874 99997.70
42 -0.25156125 1.0358279 99997.46
43 -0.08009022 1.2072990 99997.63
44 1.29167796 2.5790671 99999.01
45 -0.04579602 1.2415932 99997.67
46 0.36573444 1.6531236 99998.08
47 1.66891422 2.9563034 99999.38
48 0.81155910 2.0989483 99998.53
49 1.36026637 2.6476556 99999.07
50 0.50291126 1.7903004 99998.22
51 -0.14867863 1.1387105 99997.57
52 0.12567501 1.4130642 99997.84
53 -0.52591488 0.7614743 99997.19
54 -0.28585545 1.0015337 99997.43
55 0.43432285 1.7217120 99998.15
56 -0.32014966 0.9672395 99997.39
57 0.19426341 1.4816526 99997.91
58 -0.45732647 0.8300627 99997.26
59 -0.38873806 0.8986511 99997.33
60 0.05708660 1.3444758 99997.77
61 0.22855762 1.5159468 99997.94
62 -0.21726704 1.0701221 99997.50
63 -0.62879750 0.6585917 99997.08
64 -0.42303227 0.8643569 99997.29
65 -0.49162068 0.7957685 99997.22
66 -0.59450329 0.6928859 99997.12
67 -0.35444386 0.9329453 99997.36
68 -0.76597432 0.5214149 99996.95
69 -0.66309170 0.6242975 99997.05
70 -0.56020909 0.7271801 99997.15
71 -0.73168011 0.5557091 99996.98
$eta2
normal prediabetes diabetes
1 -100001.5 0.77726489 2.0646541
2 -100002.2 0.09138080 1.3787700
3 -100001.4 0.88014751 2.1675367
4 -100001.2 1.12020694 2.4075961
5 -100002.0 0.29714603 1.5845352
6 -100001.3 0.94873592 2.2361251
7 -100001.6 0.64008808 1.9274773
8 -100000.8 1.53173740 2.8191266
9 -100001.3 0.98303012 2.2704193
10 -100001.6 0.67438228 1.9617715
11 -100001.1 1.15450115 2.4418903
12 -100000.7 1.60032581 2.8877150
13 -100001.7 0.53720546 1.8245946
14 -100001.5 0.74297069 2.0303599
15 -100001.3 1.01732433 2.3047135
16 -100001.9 0.40002864 1.6874178
17 -100001.4 0.84585330 2.1332425
18 -100002.0 0.26285182 1.5502410
19 -100001.6 0.70867648 1.9960657
20 -100000.7 1.56603160 2.8534208
21 -100002.3 0.02279239 1.3101816
22 -100000.7 1.63462001 2.9220092
23 -100001.1 1.18879535 2.4761845
24 -100001.2 1.05161853 2.3390077
25 -100001.2 1.08591274 2.3733019
26 -100001.7 0.57149967 1.8588888
27 -100001.1 1.22308956 2.5104787
28 -100001.4 0.91444171 2.2018309
29 -100001.0 1.25738376 2.5447729
30 -100000.9 1.39456058 2.6819498
31 -100002.0 0.33144023 1.6188294
32 -100000.9 1.42885478 2.7162440
33 -100000.8 1.49744319 2.7848324
34 -100001.8 0.46861705 1.7560062
35 -100001.0 1.32597217 2.6133613
36 -100000.8 1.46314899 2.7505382
37 -100002.5 -0.18297284 1.1044163
38 -100001.7 0.60579387 1.8931830
39 -100002.4 -0.11438443 1.1730048
40 -100002.1 0.15996921 1.4473584
41 -100002.3 -0.01150181 1.2758874
42 -100002.5 -0.25156125 1.0358279
43 -100002.4 -0.08009022 1.2072990
44 -100001.0 1.29167796 2.5790671
45 -100002.3 -0.04579602 1.2415932
46 -100001.9 0.36573444 1.6531236
47 -100000.6 1.66891422 2.9563034
48 -100001.5 0.81155910 2.0989483
49 -100000.9 1.36026637 2.6476556
50 -100001.8 0.50291126 1.7903004
51 -100002.4 -0.14867863 1.1387105
52 -100002.2 0.12567501 1.4130642
53 -100002.8 -0.52591488 0.7614743
54 -100002.6 -0.28585545 1.0015337
55 -100001.9 0.43432285 1.7217120
56 -100002.6 -0.32014966 0.9672395
57 -100002.1 0.19426341 1.4816526
58 -100002.7 -0.45732647 0.8300627
59 -100002.7 -0.38873806 0.8986511
60 -100002.2 0.05708660 1.3444758
61 -100002.1 0.22855762 1.5159468
62 -100002.5 -0.21726704 1.0701221
63 -100002.9 -0.62879750 0.6585917
64 -100002.7 -0.42303227 0.8643569
65 -100002.8 -0.49162068 0.7957685
66 -100002.9 -0.59450329 0.6928859
67 -100002.6 -0.35444386 0.9329453
68 -100003.1 -0.76597432 0.5214149
69 -100002.9 -0.66309170 0.6242975
70 -100002.8 -0.56020909 0.7271801
71 -100003.0 -0.73168011 0.5557091
The coefficients for model o.age
coef.o.age<- coef(o.age)
coef.o.age normal|prediabetes prediabetes|diabetes age
2.2862099 3.5735991 0.0342942
age_value <- 44
lp.o1.bx <- coef.o.age['age'] * age_valuePutting them inside the equation
lp.o1.bx <- 0.0342942 * 44
lp.o1.bx <- 1.5089448And complete the Eq 8.25 in Hosmer
logit1 <- coef.o.age['normal|prediabetes'] - lp.o1.bx
logit2 <- coef.o.age['prediabetes|diabetes'] - lp.o1.bxThen the probabilities are
pLeq1 <- 1 / (1 + exp(-logit1)) # p(Y <= first threshold)
pLeq2 <- 1 / (1 + exp(-logit2)) # p(Y <= second threshold)
pLeq1 <- 1 / (1 + exp(-0.7772651)) # = 0.6850903
pLeq2 <- 1 / (1 + exp(-2.0646543)) # = 0.8874200
pMat <- cbind(
p1 = pLeq1, # Probability of being in the first category
p2 = pLeq2 - pLeq1, # Probability of being in the second category
p3 = 1 - pLeq2 # Probability of being in the third category
)
pMat p1 p2 p3
[1,] 0.6850904 0.2023296 0.11258
Let us confirm with the prediction made by clm()
predict(o.age, newdata = newData, type = 'prob')$fit
normal prediabetes diabetes
1 0.6850903 0.2023296 0.11258002
2 0.5228293 0.2759641 0.20120662
3 0.7068528 0.1904434 0.10270382
4 0.7540271 0.1633776 0.08259529
5 0.5737447 0.2561012 0.17015414
6 0.7208609 0.1825861 0.09655302
7 0.6547734 0.2181966 0.12703007
8 0.8222604 0.1214403 0.05629932
9 0.7277090 0.1786883 0.09360263
10 0.6624837 0.2142408 0.12327546
11 0.7603321 0.1596343 0.08003362
12 0.8320639 0.1151719 0.05276421
13 0.6311621 0.2299544 0.13888347
14 0.6776451 0.2063029 0.11605200
15 0.7344511 0.1748156 0.09073334
16 0.5986945 0.2451897 0.15611572
17 0.6996966 0.1943959 0.10590756
18 0.5653372 0.2596113 0.17505146
19 0.6701086 0.2102747 0.11961662
20 0.8272171 0.1182781 0.05450476
21 0.5056979 0.2818457 0.21245646
22 0.8368016 0.1121222 0.05107623
23 0.7665255 0.1559298 0.07754469
24 0.7410856 0.1709709 0.08794347
25 0.7476113 0.1671574 0.08523135
26 0.6391091 0.2260582 0.13483262
27 0.7726068 0.1522664 0.07512684
28 0.7139082 0.1865056 0.09958619
29 0.7785754 0.1486462 0.07277843
30 0.8013193 0.1346338 0.06404690
31 0.5821098 0.2525239 0.16536637
32 0.8067228 0.1312556 0.06202161
33 0.8171928 0.1246578 0.05814933
34 0.6150564 0.2376524 0.14729124
35 0.7901736 0.1415430 0.06828344
36 0.8120138 0.1279299 0.06005626
37 0.4543840 0.2967027 0.24891332
38 0.6469807 0.2221373 0.13088197
39 0.4714350 0.2922527 0.23631229
40 0.5399072 0.2696843 0.19040845
41 0.4971246 0.2846243 0.21825110
42 0.4374393 0.3006049 0.26195580
43 0.4799881 0.2898325 0.23017932
44 0.7844311 0.1450711 0.07049783
45 0.4885530 0.2872882 0.22415879
46 0.5904279 0.2488849 0.16068723
47 0.8414310 0.1091296 0.04943944
48 0.6924416 0.1983593 0.10919909
49 0.7958030 0.1380634 0.06613366
50 0.6231432 0.2338209 0.14303589
51 0.4628987 0.2945442 0.24255718
52 0.5313775 0.2728713 0.19575120
53 0.3714702 0.3102036 0.31832627
54 0.4290188 0.3023412 0.26863998
55 0.6069055 0.2414438 0.15165078
56 0.4206393 0.3039297 0.27543107
57 0.5484137 0.2664084 0.18517793
58 0.3876202 0.3087479 0.30363181
59 0.4040211 0.3066511 0.28932777
60 0.5142678 0.2789572 0.20677498
61 0.5568920 0.2630489 0.18005915
62 0.4458959 0.2987242 0.25537986
63 0.3477833 0.3111607 0.34105604
64 0.3957914 0.3077788 0.29642987
65 0.3795119 0.3095567 0.31093140
66 0.3556023 0.3110063 0.33339140
67 0.4123052 0.3053672 0.28232756
68 0.3173506 0.3101280 0.37252145
69 0.3400454 0.3111499 0.34880469
70 0.3634991 0.3106871 0.32581384
71 0.3248261 0.3106330 0.36454088
9.3 Checking proportional odds assumption
brant(polr_cr3)--------------------------------------------
Test for X2 df probability
--------------------------------------------
Omnibus 8.34 1 0
age 8.34 1 0
--------------------------------------------
H0: Parallel Regression Assumption holds
10 Results
datafbs.o %>%
select(-fbs_raw, -cat_fbs1, -cat_fbs2) %>% # remove unwanted variables
tbl_summary(
by = cat_fbs,
type = list(
smoking_status ~ "categorical",
sex ~ "categorical"
),
statistic = list(
all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{n} ({p}%)"
),
digits = all_continuous() ~ 2,
missing = "no"
) %>%
add_overall() %>%
bold_labels() %>%
modify_caption("**Table 1: Characteristics of FBS Categories**")| Characteristic | Overall N = 4,0921 |
normal N = 2,6401 |
prediabetes N = 8891 |
diabetes N = 5631 |
|---|---|---|---|---|
| hba1c | 5.83 (1.46) | 5.38 (0.67) | 5.76 (0.96) | 8.05 (2.46) |
| triglycerides | 1.54 (1.07) | 1.37 (0.88) | 1.70 (1.10) | 2.11 (1.50) |
| age | 48.05 (14.48) | 45.50 (14.67) | 52.53 (13.35) | 52.98 (12.09) |
| waist_circumference | 86.44 (13.04) | 84.47 (12.93) | 89.21 (12.42) | 91.31 (12.49) |
| smoking_status | ||||
| never smoked | 3,136 (77%) | 2,035 (77%) | 669 (75%) | 432 (77%) |
| quitted smoking | 322 (7.9%) | 188 (7.1%) | 80 (9.0%) | 54 (9.6%) |
| still smoking | 634 (15%) | 417 (16%) | 140 (16%) | 77 (14%) |
| sex | ||||
| female | 2,664 (65%) | 1,757 (67%) | 554 (62%) | 353 (63%) |
| male | 1,428 (35%) | 883 (33%) | 335 (38%) | 210 (37%) |
| 1 Mean (SD); n (%) | ||||
data <- data.frame(
Smoking_Status = c("Quitted Smoking", "Still Smoking"),
Coefficient_normal_vs_diabetes = c(-0.30237, 0.13946),
SE_normal_vs_diabetes = c(0.16323, 0.13488),
p_value_normal_vs_diabetes = c(0.064, 0.301),
CI_normal_vs_diabetes = c("(-0.62, 0.02)", "(-0.12, 0.40)"),
Coefficient_prediabetes_vs_diabetes = c(-0.04432, 0.16048),
SE_prediabetes_vs_diabetes = c(0.18662, 0.15472),
p_value_prediabetes_vs_diabetes = c(0.812, 0.300),
CI_prediabetes_vs_diabetes = c("(-0.41, 0.32)", "(-0.14, 0.46)")
)
# Display the table
kable(data,
col.names = c("Smoking Status",
"Coefficient", "SE", "p-value", "95% CI",
"Coefficient", "SE", "p-value", "95% CI"),
caption = "Table 2: Adjacent-category model or multinomial or baseline logit model") %>%
kable_styling(full_width = FALSE, position = "center") %>%
add_header_above(c(" " = 1,
"Normal vs Diabetes" = 4,
"Prediabetes vs Diabetes" = 4)) %>%
column_spec(1, bold = TRUE) %>%
row_spec(0, bold = TRUE)| Smoking Status | Coefficient | SE | p-value | 95% CI | Coefficient | SE | p-value | 95% CI |
|---|---|---|---|---|---|---|---|---|
| Quitted Smoking | -0.30237 | 0.16323 | 0.064 | (-0.62, 0.02) | -0.04432 | 0.18662 | 0.812 | (-0.41, 0.32) |
| Still Smoking | 0.13946 | 0.13488 | 0.301 | (-0.12, 0.40) | 0.16048 | 0.15472 | 0.300 | (-0.14, 0.46) |
ggplot(lnewdat, aes(x = age, y = Probability, colour = fbs_category)) +
geom_line() + facet_grid(sex ~ smoking_status, labeller="label_both")11 Intepretation
11.1 1. Adjacent-category model or multinomial or baseline logit model
11.1.1 Comparison of “Normal” vs. “Diabetes”
When compared to individuals who never smoked, the log odds of being in the “Normal” blood sugar category (vs. “Diabetes”) for those who quitted smoking changes by –0.30 (95% CI: –0.62 to 0.02, p-value = 0.064).
This suggests that those who quitted smoking have lower odds of being in the “Normal” group compared to the “Diabetes” group, but the association is not statistically significant. The confidence interval includes 0, indicating that the true effect could be null or even in the opposite direction.
For individuals who are still smoking, the log odds change by 0.14 (95% CI: –0.12 to 0.40, p-value = 0.301).
This indicates that current smokers have slightly higher odds of being in the “Normal” group relative to the “Diabetes” group compared to never smokers, but again, the result is not statistically significant, and the confidence interval includes 0.
11.1.2 Comparison of “Prediabetes” vs. “Diabetes”
When compared to individuals who never smoked, the log odds of being in the “Prediabetes” category (vs. “Diabetes”) for those who quitted smoking changes by –0.04 (95% CI: –0.41 to 0.32, p-value = 0.812).
This indicates no meaningful difference in odds between quitted smokers and never smokers for being in the “Prediabetes” vs. “Diabetes” category. The estimate is very close to 0 and the confidence interval is wide, suggesting high uncertainty.
For individuals who are still smoking, the log odds change by 0.16 (95% CI: –0.14 to 0.46, p-value = 0.300).
This indicates no significant association between current smoking and the odds of being in the “Prediabetes” category versus the “Diabetes” category when compared to never smokers. The confidence interval spans both negative and positive values.
11.2 2. Plot of Predicted Probabilities of fbs category by Age Faceted by Smoking status and gender
Age Effect
Across all panels, as age increases, the probability of being in the “normal” fasting blood sugar category (red line) decreases steadily, while the probability of being in the “prediabetes” (green) and “diabetes” (blue) categories increases.
This trend is consistent for both males and females, indicating that older age is associated with higher likelihood of abnormal fasting blood sugar levels, regardless of smoking status.
Smoking Status Effect
Never Smoked (left column):
Individuals who never smoked consistently show the highest probabilities of being in the “normal” category and the lowest probabilities of diabetes, especially at younger ages. This suggests a protective pattern for never smokers across the lifespan.Quitted Smoking (middle column):
Compared to never smokers, individuals who quitted smoking show a modestly lower probability of remaining in the normal range and higher probabilities of prediabetes and diabetes, especially at older ages. The gap between normal and abnormal categories narrows faster with age than in never smokers.Still Smoking (right column):
Current smokers exhibit the lowest probabilities of being in the normal category across all ages. The probability of diabetes increases more sharply with age, particularly for males, suggesting higher long-term metabolic risk among current smokers.
Sex Differences
Across all smoking categories, females tend to maintain higher probabilities of being in the normal category and lower probabilities of diabetes compared to males.
Males show a more pronounced age-related decline in the probability of normal blood sugar and a steeper increase in diabetes probability.